0.1 Introduction

The goal of this study is to examine the impact of certain variables on the climate by examining the AQI of counties across the United States of America using data collected by the EPA.

There are two smaller sub studies in this presentation: One examining the effects of the Climate Alliance legislative program, and another examining the correlation between aspects of counties and the air quality.

0.2 Reading the Data and EDA

To begin we read the data in from the EPA datasets.

## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.
## [1] 85.3

The 6 most dangerous pollutants are ozone, nitrogen dioxide, sulfur dioxide, lead, carbon monoxide, and particulate matter.

#The plots show that the concentrations have gradually decreased over time for this pollutant or in a few cases have remained the same. The places where the data varies significantly are the West, Southwest, and the Rockies.

0.3 Climate Alliance

## 
## Call:
## lm(formula = mean.state ~ is.climate.alli + Year, data = mean.state.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -18.67  -3.11   0.64   3.05  45.96 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1171.265    167.203    7.01  7.3e-12 ***
## is.climate.alliyes    0.955      0.526    1.81     0.07 .  
## Year                 -0.564      0.083   -6.79  2.9e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.15 on 547 degrees of freedom
## Multiple R-squared:  0.0829, Adjusted R-squared:  0.0795 
## F-statistic: 24.7 on 2 and 547 DF,  p-value: 5.31e-11
## 
## Call:
## lm(formula = delta.aqi.state ~ is.climate.alli + Year, data = mean.state.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33.58  -1.26   0.06   1.36  13.30 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)         81.8044    88.4278    0.93     0.36
## is.climate.alliyes  -0.1220     0.2528   -0.48     0.63
## Year                -0.0409     0.0439   -0.93     0.35
## 
## Residual standard error: 2.82 on 497 degrees of freedom
##   (50 observations deleted due to missingness)
## Multiple R-squared:  0.00221,    Adjusted R-squared:  -0.00181 
## F-statistic: 0.55 on 2 and 497 DF,  p-value: 0.577

Climate Alliance states tend to have a better AQI on average but it is not significant.

This might be because the Climate Alliance only went into effect 3 years ago in 2017. Note that climate Alliance states have better improvements on AQI on average.

0.4 County Level Effects on AQI

Using the data found by the USDA’s Economic Research Service, we look for predictors in counties to determine air quality and find correlations. This begins by merging the 2019 AQI with the latest USDA ERS data. We use 2019 data to avoid skewing due to the 2020 West Coast fires.


To begin the analysis, we start by merging county data with AQI data. We start by merging all three sets of ERS county data, and then we merge by county and state.

We only take the data from year 2019 to keep it consistent. We are avoiding using 2020 data due to the fires on the West coast skewing data.


Break the cleaned and merged dataset into X and Y for use with cv.glmnet. We use set.seed(1) for consistency.

## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(select_cols)` instead of `select_cols` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Anova Table (Type II tests)
## 
## Response: med.aqi
##                                            Sum Sq  Df F value  Pr(>F)    
## UnempRate2020                                  26   1    0.25 0.61643    
## PctEmpChange1920                              119   1    1.14 0.28520    
## UnempRate2019                                 757   1    7.26 0.00719 ** 
## UnempRate2017                                 845   1    8.10 0.00451 ** 
## PctEmpAgriculture                              60   1    0.58 0.44821    
## PctEmpMining                                    0   1    0.00 0.98589    
## PctEmpConstruction                            227   1    2.17 0.14060    
## PctEmpManufacturing                             6   1    0.05 0.81835    
## PctEmpTrans                                     1   1    0.01 0.93250    
## UnempRate2012                                  15   1    0.15 0.70224    
## UnempRate2009                                 336   1    3.22 0.07303 .  
## PopChangeRate1819                               8   1    0.08 0.77593    
## NetMigrationRate1019                          388   1    3.72 0.05400 .  
## NaturalChangeRate1019                         600   1    5.75 0.01671 *  
## Net_International_Migration_Rate_2010_2019    128   1    1.23 0.26829    
## NetMigrationRate0010                          259   1    2.48 0.11541    
## NaturalChangeRate0010                         286   1    2.74 0.09808 .  
## Immigration_Rate_2000_2010                     25   1    0.24 0.62643    
## BlackNonHispanicPct2010                       387   1    3.71 0.05451 .  
## AsianNonHispanicPct2010                         1   1    0.01 0.90762    
## NativeAmericanNonHispanicPct2010              194   1    1.86 0.17286    
## MultipleRacePct2010                            28   1    0.27 0.60379    
## NonHispanicBlackPopChangeRate0010             592   1    5.67 0.01743 *  
## NonHispanicAsianPopChangeRate0010             508   1    4.87 0.02758 *  
## HispanicPopChangeRate0010                     365   1    3.50 0.06169 .  
## MultipleRacePopChangeRate0010                  22   1    0.22 0.64263    
## WhiteNonHispanicNum2010                       503   1    4.82 0.02830 *  
## MultipleRaceNum2010                           207   1    1.98 0.15933    
## ForeignBornEuropePct                           85   1    0.81 0.36834    
## ForeignBornMexPct                             300   1    2.88 0.09028 .  
## Ed1LessThanHSPct                              268   1    2.57 0.10931    
## Ed2HSDiplomaOnlyPct                           619   1    5.93 0.01503 *  
## Ed3SomeCollegePct                               0   1    0.00 0.98653    
## Ed4AssocDegreePct                            1264   1   12.12 0.00052 ***
## FemaleHHPct                                  1200   1   11.50 0.00072 ***
## HH65PlusAlonePct                              233   1    2.23 0.13535    
## ForeignBornCaribPct                             0   1    0.00 0.96018    
## ForeignBornAfricaNum                          253   1    2.42 0.11981    
## ForeignBornMexNum                            3091   1   29.63 6.6e-08 ***
## LandAreaSQMiles2010                             7   1    0.07 0.79113    
## Deep_Pov_All                                  115   1    1.10 0.29457    
## Residuals                                  100680 965                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We remove the variables that are the least relevant to see what factors remain.


##  [1] -2.17e+00 -1.93e+00  1.68e+00 -2.23e+00  3.59e+00  3.14e+00 -3.39e+00
##  [8]  9.90e+00  8.89e-06  6.73e-05  2.04e-01  1.84e-01  1.68e-01  6.36e-02
## [15]  1.17e-01  1.60e-01  2.35e-05  1.64e-06  5.59e-03  4.29e-03  3.53e-03
## [22]  6.94e-02  4.57e-02  1.73e-01  4.87e-02  1.48e-01  6.97e-01  7.19e-01
## [29]  3.82e+00  5.72e-05

From the final model, we see that most of the impact on AQI is geographical. For example, the increase from ForeignBornMexNum and NetMigrationNum could signal that states closer to the Mexican border tend to have worse AQIs due to their location. However, the most clear predictors are the states themselves.

The assumptions for linearity appear to hold up until about 1 standard deviation below the mean.

##                                            IncNodePurity
## UnempRate2020                                       1758
## PctEmpChange1920                                    2757
## UnempRate2019                                       1666
## UnempRate2017                                       1800
## PctEmpAgriculture                                   2646
## PctEmpMining                                        2029
## PctEmpConstruction                                  1628
## PctEmpManufacturing                                 1559
## PctEmpTrans                                         1878
## UnempRate2012                                       1669
## UnempRate2009                                       1723
## PopChangeRate1819                                   1379
## NetMigrationRate1019                                1808
## NaturalChangeRate1019                               1746
## Net_International_Migration_Rate_2010_2019          2202
## NetMigrationRate0010                                1896
## NaturalChangeRate0010                               2041
## Immigration_Rate_2000_2010                          2332
## BlackNonHispanicPct2010                             2349
## AsianNonHispanicPct2010                             2058
## NativeAmericanNonHispanicPct2010                    1699
## MultipleRacePct2010                                 1825
## NonHispanicBlackPopChangeRate0010                   1911
## NonHispanicAsianPopChangeRate0010                   2178
## HispanicPopChangeRate0010                           1929
## MultipleRacePopChangeRate0010                       1809
## WhiteNonHispanicNum2010                             2958
## MultipleRaceNum2010                                 2800
## ForeignBornEuropePct                                1575
## ForeignBornMexPct                                   2342
## Ed1LessThanHSPct                                    1837
## Ed2HSDiplomaOnlyPct                                 2553
## Ed3SomeCollegePct                                   1971
## Ed4AssocDegreePct                                   3102
## FemaleHHPct                                         1913
## HH65PlusAlonePct                                    2132
## ForeignBornCaribPct                                 1513
## ForeignBornAfricaNum                                2123
## ForeignBornMexNum                                   2621
## LandAreaSQMiles2010                                 2009
## Deep_Pov_All                                        1913
## $names
##  [1] "call"            "type"            "predicted"       "mse"            
##  [5] "rsq"             "oob.times"       "importance"      "importanceSD"   
##  [9] "localImportance" "proximity"       "ntree"           "mtry"           
## [13] "forest"          "coefs"           "y"               "test"           
## [17] "inbag"           "terms"          
## 
## $class
## [1] "randomForest.formula" "randomForest"


0.5 Conclusion

The overall objective of this study was to use the AQI of counties across the USA to determine the impact of variables on the climate. Using data collected by the EPA, we were able to focus on the effect of the Climate Alliance on curbing the deterioration of the AQI across the nation, as well as the correlation between aspects of counties and their air quality.

From this study, we were able to conclude that the Climate Alliance has not had much of an effect yet on the AQI of member states, but do have better AQIs on average compared to other states. We were also able to see that most of the impact on the AQI is geographical based on the significant variables of the model.